A sociotechnical framework to assess patient-facing eHealth tools: results of a modified Delphi process

Among the thousands of eHealth tools available, the vast majority do not get past pilot phases because they cannot prove value, and only a few have been systematically assessed. Although multiple eHealth assessment frameworks have been developed, these efforts face multiple challenges. This study aimed to address some of these challenges by validating and refining an initial list of 55 assessment criteria based on previous frameworks through a two-round modified Delphi process with in-between rounds of interviews. The expert panel (n = 57) included participants from 18 countries and 9 concerned parties. A consensus was reached on 46 criteria that were classified into foundational and contextual criteria. The 36 foundational criteria focus on evaluating the eHealth tool itself and were grouped into nine clusters: technical aspects, clinical utility and safety, usability and human centricity, functionality, content, data management, endorsement, maintenance, and developer. The 10 contextual criteria focus on evaluating the factors that vary depending on the context the tool is being evaluated for and were grouped into seven clusters: data-protection compliance, safety regulatory compliance, interoperability and data integration, cultural requirements, affordability, cost-benefit, and implementability. The classification of criteria into foundational and contextual helps us assess not only the quality of an isolated tool, but also its potential fit in a specific setting. Criteria subscales may be particularly relevant when determining the strengths and weaknesses of the tool being evaluated. This granularity enables different concerned parties to make informed decisions about which tools to consider according to their specific needs and priorities.

1.1.1.Tool functioning accurately and rapidly 1.1.1.The tool is functioning accurately and rapidly, without any error messages, glitches, or crashes (e.g., unexpected stops of running, response time) 0,00% 0 1,75% "I think that this kind of a tool is valuable, especially in that it gives the understanding of the strengths and on the weak points of each solution… I'm strongly in favor of the scorecard view because this gives you the ability to zoom in on those items, then you know that this is something that I really need to develop in order to be more successful in this assessment" P27-In "The scorecard is probably a bit more suitable and provides a fairer evaluation because it takes into consideration the various criteria" P28-In-TP "I think it's a good idea to have the scorecard… because I think that there's so many different criteria, that the only way to do it is to have some kind of scoring system that you can see them very quickly, where you need to improve on" P36-PA-TP "A global number would not mean anything, so that doesn't make sense.And depending who's using the tool, obviously, there are different priorities… the scorecard shows that this assessment is comprehensive and it takes into consideration the voices of different parties, the customer, the doctors, and so on and so forth" P34-PA-TP "To me, the scorecard is the solution.And yes, there are tradeoffs, but you at least have higher chances to really elaborate on a more holistic view than just the single score" P46-TP "So, kind of some of these areas might be more relevant depending on a bit on the history and the context, so that's why at least highlighting those in the kind of scorecard format instead of single score might make actually more sense" P52-TP Favors a mix of scorecard and composite score (n= 12, 22%) "I'm a big fan of the scorecard … the risk there is if you have a lot of solutions that are being assessed (comparability becomes challenging)" P49-TP-EE "That's like a bit of both.So, I'm used to using scores when they'll have a sub scales, and so it'll probably give me, for example, overall symptom burden, but when I break it down… I might have a score for the physical symptoms and then the more psychological symptoms.So, I think, ideally, there will be a lot of tools work as you get an overall score.But you get sub scales are valid in their own right.And I like a bit of both" P53-HCP-Rs "Adding an individual score is probably--but it might give a quick reference for somebody who's comparing multiple solutions to sort of say: Okay, here's the overall score now.Let me go into depth and see where that score is coming from… if they're comparing several apps at the same time" P54-Rs

Proactive versus mass appraisal
Favors proactive appraisal (n= 36, 65%) "I may be biased here with an investor hat on, but it's due diligence in my world.
And we would never--even in the first instance, we're sourcing information from the provider.And it's the only way to do it.You have to kind of lift the stones" P6-EE-In "I think you'll never get down to the point of really understanding what the app can offer if you don't talk to the developer.I think we've seen this firsthand in different cases" P7-EE "I think the point that the mass appraisal is really dependent on the tool promotion, more than the tool efficacy or the tool usefulness, and I think the approach to choose the proactive appraisal, I think it's the better way to assess or to appraise the tool" P14-HCP-Rs "I think also the other pieces when I think about it, because I'm also a clinician, I have to put hands on it to verify" P15-HCP-EE "I would agree to the focused appraisal because the more specific the testing is, the more information you get from it and the higher the quality of the information also is" P16-HCP-EE "I think the proactive is very important because that enables you to go beyond just the way it's marketed" P17-HCP-EE "I don't think you can come up with ratings without actually having tried it because, again, there's bias in how things are going to be presented, and you need to see for yourself" P20-HCP-Rs "I think that proactive, more hands-on approach is so important.Because there's also a lot that you can find online, and when you see the reality of the tool can be very different" P47-TP "They need to look at this proactively, gather the information, look at the evidence supporting it, underpinning what the app does, the intervention, but also the evidence of the app in terms of the claims that it's making" P53-HCP-Rs "I definitely see the value of the proactive.I can't think of any other way to not go the proactive route" P54-Rs Disfavors proactive appraisal (n= 4, 7%) "50 criteria are already a lot to do to an assessment in the real world.If you have an academic company, or if you have a regulatory mean, I think then 50 is easy because the more accurate, but in, as you said, with budget and stuff like that, I'm not sure if 50 or now it looks like 33.And 33 is if you do them in depth, that's a lot" P24-IE "I mean, if for example, the solution doesn't give any data and then you have to do this by yourself, I'm not sure that some of the assessors will do that hands-on.That's a bit challenging" P50-TP-Ph Favors a mix of proactive and mass appraisal (n= 10, 18%) "For me, the first suggestion (mass appraisal), it could be a little bit superficial.It sounds to me like a bit desk research, but not more.But then, you would still need do this the other approach (proactive) which, of course, requires more time, but then you have a really matching feedback, which is really about platform or about the tool that was developed.I think the combination of the two would be ideal, I would say" P5-EE "Mix does sound good.I mean, it's a nice way to also leverage these new technologies, and it lets you kind of do the foundational, but kind of you can go a step above if you want" P14-HCP-EE "So having the screening process mass appraisal in a high-quality way is a good start.But then I think the proactive approach on top of it to validate is very valuable" P23-HCP-TP "I think that focused versus mass--maybe a combination of the two, but then defining that criteria for what is focused and what's mass might take time, right?But I think it'll pay off eventually" P28-In-TP "For me the mass appraisal might be the first step if you really have to give me the huge volume.But in any case… then you need the proactive part afterwards" P31-PA-EE "This is a very critical part because, three months ago, things are completely different.So, I would be the proactive--mass appraisal doesn't work because with whatever is screening all the internet and try to get the information is not good… But now, with what's happening with the new AI revolution that is taking place and the ability to start controlling what type of information you are able to collect, it's very debatable that when we are launching these guidelines, things will completely change.So, a mass appraisal could work" P41-Ph-TP "The focused trials, while they're necessary for, as you said, ease of use or some interoperability tests maybe, they're not required, so you can very well launch that product in the market without--or with having done very little focused appraisal.So, I agree that, in an ideal world, it is better, it is necessary, and that in the field, if we're looking at the technologies that are being launched and especially in the B2C context, I think a lot of manufacturers get away with measuring outcomes without human subjects' intervention or with very little human subjects' intervention" P42-RE

Subjective criteria
Assessor diversity (n= 34, 62%) "I think that you would need to test it through diverse populations… Even if it's just a small group, I think that will be okay, but just to be aware of this diversity" P5-EE "I think definitely diversity in the team assessing the tool… particularly for a patient-facing tool, a good demographic mix of patients.I think that's probably the most important" P12-HCP "Maybe depending on who the tools might be designed for it might be interesting to have different user groups or user personas" P14-HCP-EE "And also, thinking about diversity and inclusion.As you say, making sure that you think about this from a socioeconomic perspective, from an age perspective, from an education perspective, often from an ethnicity perspective, all of these.And you have to understand or have a reasonable level of understanding what level of variability you're going to get" P38-Ph "So, I do think having multiple people kind of from a diversity of perspectives rate that criteria would be helpful" P45-Rs "For instance, just taking a sample of end user and basically, they test the solution and provide the feedback" P55-Ph Research evidence and validation (n= 20, 36%) "All sort of user experience-related, it should be the end user has evaluated that or scored it through any number of mechanisms, through a survey through, through a usability test, through a user research study" P10-EE "I guess, most of the vendors, they just published data about usability, for example.If they use the usability score, which is the universal measurement--and you cannot use the usability score if you don't follow the steps, and that should be a public information.And then you can say, yeah, this solution, they measure the usability, they publish this, and that's the score.For me, that's reliable data" P50-TP-Ph Specify and explain to reach a common understanding (n= 16, 29%) "I think making it clear is key: This is what you're supposed to be doing.This is what we're measuring, and these are your tasks" P23-HCP-TP "There are some broad, I guess, design considerations about software which you can't ignore.If you're going to be required to navigate up and down for menus or enter data repeatedly to get to the next level, these are basic software design considerations where you can have rules--sort of you have a set of rules which are well-established rules which you follow" P38-Ph "I think you can actually create some parameters… you can actually sort of break out into sort of: (Here's three boxes), so you could sort of narrow that down" P48-TP "If you assess, for example, usability, and you don't follow certain procedure, you cannot say usability is low or high because you didn't follow the needed steps to assess this….when you are an assessor of usability, you should follow certain steps" P50-TP-Ph "I would say that, I guess in general, if there are clear examples, like what is meant with some of those… that might help to steer it towards same kind of understanding and how different people approach that kind of question" P52-TP Tool's ratings as proxy criteria (if critical mass is achieved) (n= 9, 16%) "Customer reviews can be so important because that's just the real-world utilizers as opposed to these very motivated people who agreed to participate in a study and even know how to participate in a research study and have access to advertisements for things like that" P20-HCP-Rs "From my experience, unless you've got something with thousands of reviews, it's pretty easy for these star ratings to be skewed, especially if somebody's annoyed with the product or the provider" P30-PA "If there are overall ratings of usability by a critical mass and it's high, I would say that's a good indicator that it likely has good usability" P54-Rs Tool's use metrics as proxy (n= 4, 7%) "We had a framework on verification validation, and very recently, we were looking into--there could be a great tech tool technical components of it that they can also deliver good health outcomes, but if people don't use it, you're not able to collect the data... So, for us, when we were discussing about usability and utility criteriawe were thinking about: Do they enjoy using it based on the score?Will they use it again if they had to?Did they finish the complete sessions?... Or how many times they were hitting that button on tech support?I think those were objective things that we were analyzing" P8-EE "If you have a solution prescribed, for example, what is the rate of declining to take that into use, or what is the rate of non-registering?What is the rate of going from registration to actually use?What is the range from going to taking that actually into use and being engaged within a relevant time frame?Let's say in a 12week engagement or something like that" P27-In Optionality of some criteria Favors optionality (n= 39, 71%) "I think we can't get away from having the optionality.I feel, however, there needs to be some rationale if something is selected as not relevant or not applicable.That's, I think, the risk mitigation around having optionality" P6-EE-In "I think you can give the option to exclude criteria and say it's not applicable but with a mandatory request for justification" P9-EE-Rs "My impression is that we will not escape the not applicable.Because eHealth tools are so various and the use case are limitless.So, we will not escape it" P11-HCP "I think it's good to have flexibility… because not everything is going to be black and white.There's always going to be gray, and you don't want to exclude something because it's not black or white" P12-HCP "I would argue there is no way around actually being able to exclude stuff.And I know this makes the scientific evaluation very hard.But yeah, I have very clear opinion there" P22-HCP-TP "I think there's so much variety in these devices and this medical solution.So yeah, I think it would also make this a little bit of optional and this is not that it makes it weaker or something like this" P34-PA-Rs "I think that the criteria should reflect the reality of today" P11-HCP "For the clinical decision making, this is a legal challenge that can't be met by any digital applications" P16-HCP-EE "… in terms of where we are today and in terms of making this framework as useful and relevant today.I think it makes sense... that's the reality of today.Maybe that will be different in the next 5 to 10 years.But probably this is an evolving tool.So maybe that's something worth revisiting then" P27-In "I think it is a living list.I think it's going to have to be evaluated on a regular basis.Something that is patient safety today might not be patient safety tomorrow.Things change, technological advances, medical advances, especially with the world we're going into genomics and AI and precision medicine and all of these domains" P28-In-TP "I think we don't want to be dehumanizing the way that healthcare is delivered.I mean, I think that technology should always anyway augment, complement the interaction between--broadly, the interaction between humans.But obviously, in this case, between healthcare professionals and patients" P38-Ph Favors progressive criteria (n=3, 5%) "It's very shortsighted… I think we are already at a point where AI is more precise.People make mistakes all the time.And the more under pressure they are, more mistakes they make… so, when you take that into consideration, this notion that somehow human knows better seems outdated" P2-EE "I side with the visionary who's looking down the trail" P30-PA The scorecard will automatically reflect the assessment values entered in the sheets "Core criteria -entry" and "Contextual criteriaentry" Please scroll to see radar charts The tool is functioning accurately and rapidly, without any error messages, glitches, or crashes (e.g., unexpected stops of running, response time) The tool is mostly functioning accurately and rapidly, with some error messages, glitches, or crashes (e.g., unexpected stops of running, response time) The tool is not functioning accurately and rapidly, with many error messages, glitches, or crashes (e.g., unexpected stops of running, response time) This may sometimes be impacted by local infrastructure like wifi speedthe need here is to assess the tool's technical reliability not that of the infrastructure (which is assessed elsewhere) x ✓ ✓ ✓

1.b. Reliable and available at all times
The tool is reliable and available at all times and can handle high levels of traffic and usage, with backup and recovery measures in case of downtime or system failures (e.g.enabiling offline functionality, functioning during energy interruption or under difficult environmental conditions) The tool is reliable and available most of the time (with some exceptions) and can mostly handle high levels of traffic and usage, with backup and recovery measures in case of downtime or system failures (e.g.enabiling offline functionality, functioning during energy interruption or under difficult environmental conditions) The tool is not reliable and available at all times and cannot handle high levels of traffic and usage, with backup and recovery measures in case of downtime or system failures (e.g.enabiling offline functionality, functioning during energy interruption or under difficult environmental conditions) The level of evidence will be influenced by how long the tool has been on the market (i.e.tools that have been longer on the market are more capable of showing evidence than newer tools that didn't build the user base yet).In case the tool is still very newly launched, at least a midterm evidence plan should be presented and re-evaluated after an agreed period of time (e.g.DiGAs in Germany or more recently in France).
For further guidance on assessing the quality of clinical evidence and what to look for please consult the Evidence DEFINED framework that provides a standardized approach to assessing evidence for digital health products https://www.nature.com/articles/s41746-023-00836-5 x ✓

2.b. Properly handles potentially dangerous information
The tool warns about potential risks when necessary and properly handles potentially "dangerous" information entered by a patient (e.g. when it is necessary to consult a professional), i.e. avoiding injuries to patients from the care that is intended to help them The tool warns about some but not all potential risks and does not always properly handle potentially "dangerous" information entered by a patient (e.g. when it is necessary to consult a professional), i.e. avoiding injuries to patients from the care that is intended to help them The tool does not warn about potential risks and does not properly handle potentially "dangerous" information entered by a patient (e.g. when it is necessary to consult a professional) x ✓ ✓

2.c. Differentiates between clinical and technical feedback
The tool differentiates between clinical and technical feedback, and clearly channels clinical feedback that may pose a health risk through the proper channels (e.g.advising the patient to call their care team, go to the ER....) and reviews them for vigilance and post-market surveillance purposes and, where relevant, notify them to competent authorities The tool does not differentiate between clinical and technical feedback, and does not clearly channel clinical feedback that may pose a health risk through the proper channels (e.g.advising the patient to call their care team, go to the ER....) and does not review them for vigilance and post-market surveillance purposes

3.a. User research
The tool's usability and acceptability has been rigorously trialled and tested in a real world setting, and its effectiveness was verified by strong evidence in published scientific literature (e.g.peer reviewed usability studies and user research) The tool's usability and acceptability has been partially tested in a real world setting, and its effectiveness was verified by weak evidence in published scientific literature (e.g.peer reviewed usability studies and user research) The tool's usability and acceptability has not been tested in a real world setting, and its effectiveness was not verified by evidence in published scientific literature (e.g.peer reviewed usability studies and user research) The level of evidence may be influenced by how long the tool has been on the market (i.e.tools that have been longer on the market are more capable of showing evidence than newer tools that didn't build the user base yet).Assessors are advised to have a closer look at such studies to inspect their quality (e.g.sample size, sample diversity, and rigor of the study methodology) x ✓ ✓ ✓

3.b. Easy to navigate
It is easy to navigate through the tool (e.g. to move from one location to another and to move backwards, and the design is responsive to the screen size used) It is not always easy to navigate through the tool (e.g. to move from one location to another and to move backwards, and the design is not always responsive to the screen size used) It is difficult or confusing to navigate through the tool (e.g. to move from one location to another and to move backwards, and the design is not responsive to the screen size used) For further details, there are also usability standards that the assessor may refer to.E.g.ISO/TC 210 is a standard for quality management and corresponding general aspects for products with a health purpose including medical devices https://www.iso.org/standard/63179.html

3.d. Visual design is appealing
The visual design is appealing and has a harmonious look and feel (including colours, and fonts are appropriately sized for the target audience) The visual design is somewhat appealing and has a harmonious look and feel (including colours, and fonts are appropriately sized for the target audience) The visual design is not appealing and does not have a harmonious look and feel (including colours, and fonts are not appropriately sized for the target audience) 3.e.Well structured To tool's appearance is well structured, and important information is clear and stands out To tool's appearance is somewhat well structured, and important information is sometimes clear but doesn't always stand out To tool's appearance is not well structured, and important information is not clear and does not stand out x x ✓ ✓ ✓

3.f. Evidence for user engagement
There's evidence for co-creation and collaboration with users in the tool's development (e.g. a strong and balanced advisory board with clinical/patients/technical team members able to lead the product design and development) There's evidence for co-creation and collaboration with some users in the tool's development (e.g.advisory board is not balanced with only some but not all relevant stakeholder groups such as clinical/patients/technical team members able to lead the product design and development) There's no evidence for co-creation and collaboration with users in the tool's development (e.g.there's no advisory board with clinical/patients/technical team members able to lead the product design and development)

✓ ✓ ✓
The assessor may add cluster-specific comments in this area...
The assessor may add cluster-specific comments in this area...
The assessor may add cluster-specific comments in this area...

3.g. Ongoing feedback and call to action
The tool provides appropriate ongoing feedback and appropriate call to action based on the user's state and activities (e.g.provides guidance based on user-entered information) The tool partially provides some feedback and call to action based on the user's state and activities (e.g.provides guidance based on user-entered information) The tool does not provide any feedback and call to action based on the user's state and activities (e.g.does not provide guidance based on user-entered information) In some limited cases, depending on the tool's objectives and use cases, this criterion may not be applicable x x ✓ ✓

3.h. Design appropriateness and accessiblity
The tool's content and design are appropriate for the target audience and accessible to vulnerable populations (e.g.adjust text size, text to voice, colourblind colour scheme adjuster, any specificity for use by minors, offline features, left and right handed options ).I.e. it takes the user context into account and does not vary in quality because of personal characteristics such as dexterity, disabilities, movement disorders, or vision problems etc.
The tool's content and design are partially appropriate for some but not all target audiences and not always accessible to vulnerable populations (e.g.adjust text size, text to voice, colourblind colour scheme adjuster, any specificity for use by minors, offline features, left and right handed options ).I.e. it somewhat takes the user context into account and may partially vary in quality because of personal characteristics such as dexterity, disabilities, movement disorders, or vision problems etc.
The tool's content and design are not appropriate for the target audience and not accessible to vulnerable populations (e.g.adjust text size, text to voice, colourblind colour scheme adjuster, any specificity for use by minors, offline features, left and right handed options ).I.e. it does not take the user context into account and varies in quality because of personal characteristics such as dexterity, disabilities, movement disorders, or vision problems etc.

3.i. Fosters HCP-patient interaction
The tool has the ability to foster the interaction between the health care professionals and their patients (e.g.communication features, feedback options) The tool does not have the ability to foster the interaction between the health care professionals and their patients (e.g.there are no communication features, feedback options) In some very limited cases where the tool being assessed is completely autonomous and is not designed to be integrated in the traditional halthcare setting, this criterion may not be applicable x ✓ ✓

4.a. Clear privacy policy
The tool has a clear privacy policy and informs the users on how their data will be kept confidential and secured and how the data may be used (e.g., for commercial or research purposes) The tool has a complex privacy policy and the users are not clearly informed on how their data will be kept confidential and secured and how the data may be used (e.g., for commercial or research purposes) The tool does not have a clear privacy policy and does not inform the users on how their data will be kept confidential and secured and how the data may be used (e.g., for commercial or research purposes) The tool respects informed consent and allows the user to opt out of data collection (e.g.ability to configure the settings of their data storage, access, and management) The tool partially respects informed consent and allows the user to opt out of some but not all data collection (e.g.ability to configure some settings of their data storage, access, and management) The tool does not respect informed consent and does not allow the user to opt out of data collection (e.g.ability to configure the settings of their data storage, access, and management) ✓ ✓ ✓

4.c. Data accessibility
The tool and its data can be accessed at any time and on different platforms and operating systems (e.g., Android, iOS…) ✓ ✓ ✓

4.d. Enables easy data deletion
The tool explicitly and easily enables users to delete their data The tool enables users to delete their data but not in an easy or explicit way The tool does not enable users to delete their data ✓ ✓ ✓

5.a. Clear info about features and use
There is clear information about the tool's features and appropriate ways to utilize it (e.g., adjunct, standalone) and the population it is designed to serve, presented in a concise way not overwhelming to the user There is some information about the tool's features and appropriate ways to utilize it (e.g., adjunct, standalone) and the population it is designed to serve, but it is presented in an inconcise way and can be overwhelming to the user There is no information about the tool's features and appropriate ways to utilize it (e.g., adjunct, standalone) and the population it is designed to serve ✓ ✓ ✓

5.b. Functionality is clearly identifiable
The functionality of each element is clearly identifiable (e.g. if the user must take a specific action, the tool clearly and visually indicates the action to be taken) The functionality of some, but not all, elements is somewhat identifiable (e.g. if the user must take a specific action, the tool indicates the action to be taken, but not always in a clear way) The functionality of some elements is not identifiable (e.g. if the user must take a specific action, the tool does not indicate the action to be taken)

5.c. Specific, measurable and achievable goals
The tool has specific, measurable and achievable goals (desired outcomes) that are specified/obvious within the tool itself The tool has somewhat specific, measurable and achievable goals (desired outcomes) but they are not specified/obvious within the tool itself The tool does not have specific, measurable and achievable goals (desired outcomes) ✓ ✓ ✓

5.d. Interactive features are customisable
Interactive features such as reminders, push notifications, and prompts are customisable and not overwhelming (e.g.users can customise the frequency and timing of reminders to suit their daily routines) Some, but not all, interactive features such as reminders, push notifications, and prompts are somewhat customisable (e.g.users can customise the frequency or timing of reminders to suit their daily routines) Interactive features such as reminders, push notifications, and prompts are not customisable ✓ ✓ ✓

Content
6.a.Content is accurate, complete, consistent, and timely Health-related content is accurate, complete, consistent, and timely (e.g. according to state of the art scientific evidence) Health-related content is somewhat, but not always, accurate, complete, consistent, or timely (e.g. according to state of the art scientific evidence) Health-related content is not accurate, complete, consistent, nor timely (e.g. according to state of the art scientific evidence) x ✓ ✓ ✓

6.b. Content is appropriate for target audience
The tool's content is provided in a clear and appropriate way for the target audience (using an understandable, plain and simple language, with messages adapted to the user profile in terms of linguistic style and level, facilitating user understanding and avoiding using technicalities) The tool's content is somewhat, but not always, provided in a clear and appropriate way for the target audience (using a somewhat understandable, plain and simple language, with messages somewhat adapted to the user profile in terms of linguistic style and level, and not always avoiding using technicalities) The tool's content is not provided in a clear and appropriate way for the target audience (does not use an understandable, plain and simple language, and messages are not adapted to the user profile in terms of linguistic style and level, using technicalities, not facilitating user understanding)

6.c. Sufficient information
There is sufficient information throughout the tool without any omissions, over-explanations, or irrelevant data There is somehat sufficient information throughout the tool, but there are some omissions, or over-explanations, or irrelevant data There is no sufficient information throughout the tool, there are clear omissions, or over-explanations, or irrelevant data

6.e. Content reviewed by patients
The content has been reviewed by patients to ensure readability and acceptability The content has not been reviewed by patients to ensure readability and acceptability ✓ ✓ ✓

6.f. Quality information from credible sources
The tool contains high quality information (e.g.text, feedback, charts, measures, references) from credible and legitimate sources (e.g., WHO) The tool contains sometimes, but not always, high quality information (e.g.text, feedback, charts, measures, references) and sometimes, but not always, from credible and legitimate sources (e.g., WHO) The tool does not contain high quality information (e.g.text, feedback, charts, measures, references) nor from credible and legitimate sources (e.g., WHO) ✓ ✓

6.g. Content reviewed by HCPs
The content has been reviewed by (or originated from) healthcare professionals with the most updated evidencebased practice of medicine The content has not been reviewed by (or originated from) healthcare professionals ✓ ✓ ✓

6.h. Content relevant for its specified purpose
The tool's contents are relevant to the underlying objective and likely to be effective in achieving the specified purpose in the specific intended population The tool's contents are somehow relevant to some of the underlying objective and may be effective in partially achieving the specified purpose in the specific intended population The tool's contents are not relevant to the underlying objective and not likely to be effective in achieving the specified purpose in the specific intended population This is an adoption criterion.It is impacted by how long the tool has been on the market (the longer the tool has been on the market, the more likely it will be used and verified or endorsed by health authorities) ✓ ✓ ✓

Maintenance
The assessor may add cluster-specific comments in this area...
The assessor may add cluster-specific comments in this area...
The assessor may add cluster-specific comments in this area...
The assessor may add cluster-specific comments in this area...
The assessor may add cluster-specific comments in this area...

Periodic updates and maintenance
The tool gets periodic updates and maintenance both from technical and content perspectives (e.g. last update not older than xx months depending on the use case, the content is periodically updated with the new findings in the medical field) The tool partially gets updates and maintenance either from technical or content perspectives (e.g. last update is relatively old, the content is only partially updated with the new findings in the medical field) The tool does not get periodic updates and maintenance both from technical and content perspectives (e.g. the tools has not been updated in a relatively long time, the content is not updated with the new findings in the medical field) ✓ ✓ ✓

9.a. Ethical conduct
The tool's provider respects ethical conduct, clinical responsibility, and the rules and regulations protecting patient's rights and societal interests (e.g., the tool was approved or certified by a regulatory body in the case of software as a medical device, GDPR, HIPAA...) The tool's provider only sometimes, but not always, respects ethical conduct, clinical responsibility, and the rules and regulations protecting patient's rights and societal interests (e.g., the tool was in some cases approved or certified by a regulatory body in the case of software as a medical device, GDPR, HIPAA...) The tool's provider does not respect ethical conduct, clinical responsibility, and the rules and regulations protecting patient's rights and societal interests (e.g., the tool was not approved or certified by a regulatory body in the case of software as a medical device, GDPR, HIPAA...) ✓ ✓ ✓

9.b. Developer interaction quality
Interaction quality between the tool's provider and the users, including responsiveness, after sales services, and customer orientation is extremely high (e.g.provider responds to direct requests/messages swiftly and professionally) Interaction quality between the tool's provider and the users, including responsiveness, after sales services, and customer orientation is acceptable (e.g.provider takes a relatively long time to respond to direct requests/messages, the response is not always very professional) Interaction quality between the tool's provider and the users, including responsiveness, after sales services, and customer orientation is very low or non-existent (e.g.provider does not responds to direct requests/messages) x ✓ ✓ ✓ 9.c.Proactive approach to user needs Demonstration of excellence in a proactive approach to the assessment of user needs, and continuous learning (e.g.provider continuously takes user feedback into account in the periodic updates and iterations of the tool and communicates this to the users) Demonstration of some engagement in the assessment of user needs, and continuous learning (e.g.provider somewhat takes user feedback into account in the periodic updates and iterations of the tool and sometimes communicates this to the users) Lack of proactive approach to the assessment of user needs, and continuous learning (e.g.provider does not take user feedback into account in the periodic updates and iterations of the tool) If this information is not publicly communicated by the tool's provider, the assessor may need to directly ask about how the provider takes user feedback into account and how they communicate the resulting changes to the users The assessor may add cluster-specific comments in this area...

Comorbidities
The assessor may add cluster-specific comments in this area...

Considers related health issues
The tool considers multiple health issues and related ones, and sufficiently addresses them to help meet the intended purpose without overwhelming the user (i.e.consider evidence-based comorbidities, and features that may improve overall quality of life, e.g.adding breathing exercises in a remote patient monitoring tool for lung cancer patients) ✓ ✓

Data definition
The assessor may add cluster-specific comments in this area...

Metadata definition, findability, and retrievability
Data findability and retrievability: Metadata definition (including e.g. units used, reference to controlled vocabularies, ontologies) and the ability for users to retrieve data with the same granularity specified by the metadata (up to and including raw data sets) ✓ ✓ ✓

Behavioral and social
The assessor may add cluster-specific comments in this area...

20.a. High quality interactive features
The tool includes high quality interactive features (enables user input and reaction) and is presented in an engaging way (e.g., contains the right mix of video/audio/text/graphics) ✓ ✓ ✓

20.b. Customisability
The tool is customisable and allows the user to control all the necessary settings and features (e.g., notifications, alerts, sounds, colours, and fonts) except for the features that form an essential part of an intervention (e.g. the tool allows the patients to customise the time of a certain reminder according to their daily routine but doesn't allow them to remove it) ✓ ✓ ✓

20.c. Persuasiveness and behavioural change
The tool is persuasive and aims at understanding what influences people's behaviour and decision making, and then uses this information to design compelling user interactions by offering relevant and customisable therapeutic activities and encouraging users to complete them (e.g. through incentivization, gamification...) in a way that balances engagement vs tool additction This criterion is optional and may not be applicable depending on the specific use case of the tool being evaluated ✓ ✓

20.d. Possibilities for peer support
The tool provides possibilities for peer support and/or social networking (e.g.anecdotal evidence -a space to share experiences like patient forums, groups etc) and/or supported by patient organisations or advisory groups This criterion is optional and may not be applicable depending on the specific use case of the tool being evaluated ✓ ✓

Adoption and implementation
The assessor may add cluster-specific comments in this area...

21.a. Implementation and user base
The tool is implemented and utilised within the target health system under usual care OR a large group of clinicians officially refers patients to utilise it (this can be checked by looking at the unique monthly users and their percentage in relation to the target population in the target health system/market).The size of the target population needs to be evidence based (e.g. if it's pilot or beta how big is a therapeutic area?size of the technology provider?when did they start selling the tool?) This is an adoption criterion.Meaning that it is impacted by how long the tool has been on the market (the longer the tool has been on the market, the more likely it will rank higher for this criterion) ✓ ✓ ✓

21.b. Feasibility of implementation planning
Assesses the extent to which the tool can be implemented as intended (i.e., feasibility of implementing the tool at a pre-determined date and time).This can be checked by looking at how long it takes, on average, from the contractual agreement until the tool is fully up and running in a healthcare organisation This is an adoption criterion and depends on whether the tool requires a high degree of integration (i.e. may be "not applicable" for some tools).It also depends on the business model of the tool being evaluated (e.g.B2B2C can be quite complex) ✓ ✓ ✓

21.c. Favourable pre-conditions
How favourable are the pre-conditions (strategic, political, and environmental contexts) that influence the scaling up of the eHealth tool.For example, the tool's suitability to the socioeconomic context in question, considerations of foreign languages that the tool needs to support, literacy level, and the local regulatory environment such as standard reimbursement processes for eHealth tools This is a contextual criterion.Its assessment will be different depending on the context that the tool is being considered for ✓ ✓ ✓

21.d. Visible users' reviews
The tool's visible and verified users' reviews and ratings are favourable (e.g. a star rating above 4/5 stars in the app store, or the Net Promoter Score -NPS).Using users' perceived value through users' reviews and ratings as a proxy for quality, usefulness, or acceptability and popularity This is an adoption criterion.Meaning that it is impacted by how long the tool has been on the market (the longer the tool has been on the market, the more likely it will rank higher for this criterion).This criterion is meaningful only when the tool reaches a critical mass (i.e. a large enough number of user reviews) ✓ ✓ ✓

21.e. Availability of phase-out scenarios
Availability of phase-out scenarios, if the tool's provider stops producing/maintaining the tool ✓ ✓ ✓

Provider details and experience
The assessor may add cluster-specific comments in this area...Where does the initial list of assessment criteria come from?
To minimise bias and to avoid an initial set of criteria that is impacted by the subjective opinions of the research team, we conducted a systematic literature review to objectively identify the relevant criteria used to assess the quality and impact of eHealth tools.1.Which criteria are relevant and important to be included in the validated assessment framework (only criteria rated as 4 or 5 by 75% of the expert panel will be kept) -an overview of all criteria that you will be rating is shown in Figure 1 below 22.a.Provider details availabilityContact details of the tool's provider are available, easy to find, and include office address, email, and team members details ✓ ✓ ✓22.b.Credentials of those involved in development and fundingAvailability of information and credentials of the individuals and organisations involved in the development and funding of the tool (transparency on the involvement of any parties that may lead to conflict of interest, e.g.commercial sponsors and partners, financial disclosure) ✓ ✓ ✓22.c.Provider eHealth or healthcare experienceThe tool's developer and / or its core team members have specific experience in the eHealth field OR academic institution (e.g., university) OR health care system (or large health providers' organisation).E.g. an experienced medical director that oversees the algorithms / contents / features of the tool✓ ✓ ✓eHealth assessment criteria -Delphi process, Round 1Thank you for participating in this research You have been invited to contribute because you have been identified as a key opinion leader and a leading eHealth expert with in-depth knowledge and understanding of what it takes to achieve successful adoption and implementation of novel technologies in healthcare.Your input will help shape the discussion around assessing the quality and impact of eHealth technologies, an area that is still being shaped and evolving every day.The survey will take about 25-30 minutes to complete.

We searched 5
databases for studies published between 2012 and 2022, yielding 675 results, of which 40 studies met the inclusion criteria.Similar assessment criteria from the different papers, frameworks, and initiatives were aggregated in 36 unique criteria grouped in 13 clusters.The resulting criteria were classified into technical, social, and organisational criteria.The technical assessment criteria were grouped in 5 clusters: technical aspects, functionality, content, data management, and design.The social assessment criteria were grouped in 4 clusters: human centricity, health outcomes, visible popularity metrics, and social aspects.And the organisational assessment criteria were grouped in 4 clusters: sustainability and scalability, health care organisation, health care context, and developer as shown in Figure 1 below.https://pubmed.ncbi.nlm.nih.gov/36843321/We ask you to help us pressure-test and validate this list of criteria by giving us your expert opinion on:

2.
Which criteria should be removed 3. Which criteria should be added if it was missed in the initial list of criteria (you will always have the option to add new criteria in every sub-category) 4. The risk categories that are valid for each criterion (e.g. the assessment criterion "behavioural change and persuasiveness" for tier C (digital interventions), but may not be relevant for tier A (system technologies) such as e-prescriptions and e-appointment systems) -an overview of eHealth tools classified by intended purpose and stratified into risk tiers according to NICE ESF is shown in Figure 2 below Supplementary Reference 2: Copy of Round 1 Survey (24 pages)Scope: This project focuses solely on assessing patient-facing eHealth tools, including selfmanagement tools and remote eHealth solutions, rather than tools used within and between care providers (e.g., Electronic Health Records), digital biomarkers, or health data analytics systems used at population level.

t assess Supplementary Table 1: Round 1 survey results (2 pages)
The tool provides adequate training resources for end users to ensure their comfort with basic competencies and skills needed to use the tool effectively (e.g., in the form of training material, tutorials, videos, user guides or documentation) The tool has a clear privacy policy and informs the users on how their data will be kept confidential and secured and how the data may be used (e.g.The tool's visible users' reviews and ratings are favourable (e.g. a star rating above 4/5 stars).Using users' perceived value through users' reviews and ratings as a proxy for quality, usefulness, or acceptability and popularity.The tool has been verified, given a good review, or endorsed by a legitimate/reliable source such as a health organisation or health authority (e.g., APA; FDA in the US; NIH; NHS in the UK; NICE in the UK).
1.4.1.b.Compliant with applicable privacy laws 1.4.1.b.The tool explicitly reports being compliant with the relevant data privacy and protection laws (e.g.GDPR, HIPAA...), and the treatment of any personal data is compatible with the Patient Data Act, Personal Data Act, and other applicable 1.4.1.c.Respects informed consent 1.4.1.c.The tool respects informed consent and allows the user to opt out of data collection (e.g.ability to configure the settings of their data storage, access, 1.4.2.a.Allows data exchange 1.4.2.a.The tool allows exchange of data with other apps, e-tools, wearable devices, electronic health records (ability to exchange data with other systems on a 1.4.2.b.The tool allows the user to move across different platforms (e.g.mobile app vs web app, iOS vs Android) 2.1.1.a.High quality interactive features 2.1.1.a.The tool includes high quality interactive features (enables user input and reaction)and is presented in an engaging way (e.g., contains the right mix of video/audio/text/graphics) 2.1.1.b.The tool is customisable and allows the user to control all the necessary settings and features (e.g., notifications, alerts, sounds, colours, and fonts) 2.2.1.User research 2.2.1.The tool has been trialled and tested in a real world setting, and its effectiveness was verified by evidence in published scientific literature (e.g.usability studies and 2.2.2.Clinical evidence 2.2.2.The tool's clinical effectiveness is supported by strong research (e.g.preregistered RCTs -Randomised Controlled Trials) with adequate statistical power conducted by credible sources, in which the tool was found to be superior to an appropriate placebo or equivalent to acceptable evidence-based treatment 2.2.3.a.Gone through the proper certification processes 2.2.3.a.The tool clearly identifies the risks that its management may pose for user safety and has gone through the proper certification processes to ensure its safety (e.g.software as a medical device, third-party certification by a medical or governmental

Supplementary Table 2: Key themes and quotes considering the key directional decisions and challenges as expressed by the expert panel participants in the one-to-one interviews (6 pages)
"I completely agree on the optionality thing.I would say in those where criteria might be because we talked a little bit about the different use cases where it's applicable or not.So, you cannot just throw one framework at every use case, as we discussed.But I like the idea of the optionality because, as you said, it gives the point for reflection.That means basically if you skip it, you skip it why" P37-Ph You don't do this, you don't give it, because if you do optionality, you say that, yeah, you can misunderstand it.So, you're allowed to not understand it… So, you give way, you give leeway to the assessor" P24-IE "I would put everything mandatory, but then I would leave a kind of open space in case the one who are doing the assessment might have some observations to include" P32-PA-EE "This can bring a lot of variability, which I'm not sure it's good for the framework"

Assessment Criteria -short version Assessment Criteria -complete version
The tool is customisable and allows the user to control all the necessary settings and features (e.g., notifications, alerts, sounds, colours, and fonts) except for the features that form an essential part of an intervention (e.g. the tool allows the patients to customise the time of a certain reminder according to their daily routine but doesn't 2.3.1.The tool's visible and verified users' reviews and ratings are favourable (e.g. a star rating above 4/5 stars in the app store, or the Net Promoter Score -NPS).Using users' perceived value through users' reviews and ratings as a proxy for quality, usefulness,

Table 3 : Round 2 survey results (2 pages) 1
.4.3.b.Reliable and available at all times 1.4.3.b.The tool is reliable and available at all times and can handle high levels of traffic and usage, with backup and recovery measures in case of downtime or system failures (e.g.enabling offline functionality, functioning during energy interruption or under difficult

Tool name Link to the tool's website Tool description Developer information Assessor profile Date of the assessment Risk tier Assessor notes Tier A tools are those
This assessment instrument and all its components are intended for educational purposes only and are not intended as legal advice.Payers have differing coverage, and reimbursement policies.Laws, regulations, and health insurance policies concerning coverage, coding, and reimbursement are complex and are evolving rapidly.For legal advice, please consult with legal counsel.E.g. self-appraisal done by the tool devloper, the assessor is a hospital administrator, a clinician...etc.

The interactive assessment instrument will be available for download on the project website https://ehealth-criteria-toolbox.net/
Assessments should be periodically revised as eHealth tools are typically often updated and further developed as new technologies emerge Risk tier A, B, or C. Please see guidance below to help define which risk tier the tool you are assessing belongs to (according to the selected risk tier some criteria may not apply)Any additional notes about the tool or the developer that the assesor(s) would like to add